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Stencil computations, involving operations over the elements of an array, are a common program- 
ming pattern in scientific computing, games, and image processing. As a programming pattern, 
stencil computations are highly regular and amenable to optimisation and parallelisation. However, 
general-purpose languages obscure this regular pattern from the compiler, and even the programmer, 
preventing optimisation and obfuscating (in)correctness. This paper furthers our work on the Ypnos 
domain-specific language for stencil computations embedded in Haskell. Ypnos allows declarative, 
abstract specification of stencil computations, exposing the structure of a problem to the compiler 
and to the programmer via specialised syntax. In this paper we show the decidable safety guaran- 
tee that well-formed, well-typed Ypnos programs cannot index outside of aiTay boundaries. Thus 
indexing in Ypnos is safe and run-time bounds checking can be eliminated. Program information is 
encoded as types, using the advanced type-system features of the Glasgow Haskell Compiler, with 
the safe-indexing invariant enforced at compile time via type checking. 



1 Introduction 



Stencil computations, otherwise known as stencil codes ['29] p22\] or structured grid computations lUl, 
are a ubiquitous pattern in programming, particularly in scientific computing, games, image processing, 
and similar applications. Stencil computations are characterised by array operations where the elements 
of an output array are computed from corresponding elements, and their surrounding neighbourhood 
of elements, in an input array, or arrays. An example is the discrete Laplace operator which, for two- 
dimensional problems, can be written in C as the following, for input array A and output array B: 



for (int i=l; i<(N+l); i++) { 
for (int j=l; j<(M+l); j++) { 

B[i][j] = A[i+l][j] + A[i-l][j] + A[i][j + 1] + A[i][j-1] - 4*A[i][j] 

} 

} 



where A and B are arrays of size {N+2) x (M+2) with indices ranging from (0,0) to (A'^ + 1,M+ 1). The 
iteration space of the f or-loops ranges from (1,1) to {N,M) where the elements outside of the iteration 
space provide a halo of the boundary conditions for the operation. The discrete Laplace operator is an 
example of a convolution operation in image processing lUl. 

Indexing in C is unsafe: any indexing operation may, without any static warnings, address memory 
"outside" the allocated memory region for A or B, causing a program crash or, even worse: affecting 
the numerical accuracy of the algorithm without crashing the program. Bounds-checked array access 
is provided by many languages and libraries to prevent unsafe program behaviour due to out-of-bounds 
access. However, bounds checking every array access has considerable performance overhead |[T6l : we 
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measured an overhead of roughly 20% per iteration for safe-indexing over unsafe-indexing in Haskell 
for a two-dimensional discrete Laplace operator on a 512 x 512 image (see Appendix [A| for details). 

If it is known, or can be proved, that memory accesses are within the bounds of allocated regions, 
then costly bounds checking can be safely eliminated. In the presence of general, or random, indexing, 
such as in most language and libraries, automatic proof of the absence of out-of-bounds access is in 
general undecidable as indices may be arbitrary expressions. 

This paper presents our latest work on Ypnos, an embedded domain-specific language in Haskell. 
Ypnos programs express abstract, declarative specifications of stencil computations which are guaranteed 
free from out-of-bounds array access, and thus array indexing operations without bounds checking can be 
used by the implementation. Ypnos lacks general array indexing but instead provides a form of relative 
indexing via a novel syntactic construction called a grid pattern, as introduced in ifTTl . Grid patterns 
provide decidable compile-time information about array access. 

Our previous paper said little about boundary conditions and safety, instead focussing on expressivity 
and parallelisation. In this paper, we introduce language constructs in Ypnos for specifying the boundary 
conditions of a problem and the mechanism by which array indexing is guaranteed safe. Grid patterns 
are a typed construction which encode array access information in the types. Safe indexing is enforced 
via type checking at compile time, facilitated by Haskell type system features such as type classes 1711201 
and more advanced type system features found in the Glasgow Haskell Compiler (GHC) including: gen- 
eralised algebraic data types (GADTs) |[T9llT8ll and type families pP]. Dependent types have been used 
to enforce safe indexing in more general settings 1*251 1281 . However, a well-developed dependent type 
system is not required to enforce safe indexing in Ypnos due to the relative indexing scheme provided 
by the grid pattern syntax; instead GADTs suffice. 

This paper was inspired by related work on the Repa array library for Haskell ifTOl . The most recent 
paper |T4 1 introduces a mechanism for the abstract specification of convolution masks and a data structure 
for describing stencils. Given an array with adequate boundary values a stencil application function 
can safely elide bounds checking. We believe that the Ypnos offers greater flexibility and expressivity, 
particularly for higher-dimensionality problems and complex boundary conditions, facilitated by grid 
patterns and Ypnos's approach to boundaries and safety (see Section[5]for more on Repa). 

Section |2] introduces the core aspects of the Ypnos language relevant to an end-user. We do not 
discuss all Ypnos features here. In particular we elide reductions, iterated stencil computations, and 
automatic parallelisation, details of which can be found in IfTTl . Section |3]discusses details of the Ypnos 
implementation, in particular the type-level techniques used for encoding and enforcing safety. The 
desugaring of Ypnos's specialised syntax is also informally described in this section. Section[4]provides 
a quantitative analysis of the effectiveness of Ypnos for producing correct, efficient stencil programs, 
along with performance numbers. Section [5] considers related work of other DSLs and languages for 
array programming. Section |6] discusses further work and Section [7] concludes with a discussion on the 
relevance of the techniques presented in this paper to the wider DSL audience. Appendix [C] provides a 
proof sketch of the soundness of the approach to safety presented in this paper. 

The Ypnos implementation, and the source code for the programs used in this paper, can be down- 
loaded from |http : //github . com/dorchard/ypnos 

A deep knowledge of Haskell is not required to understand the paper, although knowledge of ML 
or other functional language is helpful. A brief introduction to the more advanced GHC/Haskell type 
system features employed in the implementation is provided in Appendix [B] Throughout, types starting 
with a lower-case letter are type variables and are implicitly universally quantified; types starting with 
an upper-case letter are type constructors, which are (possibly nuUary) constructors e.g. Int. 



70 



Efficient and Correct Stencil Computations via Pattern Matching and Static Typing 



2 Introduction to Ypnos 

We introduce the main aspects of Ypnos that would be relevant to an end-user, using the example of the 
two-dimensional discrete Laplace operator: 



[dimensions I X, Y |] 

laplace2D = [fun I X*Y: I _ t _ I 

I 1 @c r I 

|_ b _|->t+l+r+b- 4.0*c |] 

laplaceBoundary = [boundary I Double from (-1, -1) to (+1, +1) -> 0.0 |] 

grid = listGrid (Dim X :* Dim Y) (0, 0) (w, h) img_data laplaceBoundary 
grid' = runA grid laplace2D 



Here img_data, w, and h are free variables defined elsewhere, providing the data of a grid as a list of 
double-precision floating point values, and the horizontal and vertical size of the grid respectively. 

Ypnos mostly comprises library functions thus the syntax of Ypnos programs is largely that of 
Haskell. However there is some important Ypnos-specific syntax written inside of quasiquoting brackets 
|[T5l and expanded by macro functions, with the forms: 



[dimension I ...ypnos code... |] 
[fun I ...ypnos code... |] 
[boundary! ...ypnos code... |] 



The macros essentially give an executable semantics to the specialised syntax of Ypnos (discussed further 
in Section[3]l. Within the quasiquoted brackets any expression following an arrow symbol -> is a Haskell 
expression i.e. in Lisp terminology, expressions following -> are backquoted. 

2.1 Grids and Dimensions 

Ypnos is built around a central data type of w-dimensional immutable arrays. The term grid is used 
instead of array to escape implementational connotations. There are a number of grid constructors, one 
of which is used in the above example: listGrid, which takes five arguments: a dimension term, the 
lower extent of the grid, the upper extent of the grid, a list of element values, and a structure describing 
the grid's boundary behaviour. 

Dimension terms define the dimensionality of a grid, naming the dimensions. Any dimension names 
used in a program must first be declai^ed in a single declaration via the [dimension I ... I ] macro, 
e.g. in the example the X and Y dimensions are declared. A dimension term is constructed from one or 
more dimension identifiers, prefixed by Dim constructors, where many dimension can be composed by 
the binary : * dimension-tensor operation. 

2.2 Indexing: Grid Patterns 

Ypnos has no general indexing operations on grids. Indexing is instead provided by a special pattern 
matching syntax called a grid pattern, first introduced in the earlier Ypnos paper [17]. A grid pattern is 
a group of variable or wildcard patterns which are matched onto a subset of elements in a grid relative to 
a particular element. For example, the following is a one-dimensional grid pattern on the X dimension: 
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X: I 1 @ c r I 



This grid pattern, which matches on a one-dimensional grid of dimension X, binds the variable pattern 
c to the element of the grid which is indexed by a cursor index internal to the grid data structure. The 
cursor is the index of the current iteration of an array operation. The variable patterns 1 and r are 
bound to elements relative to the cursor element i.e. 1 to the preceding element and r to the succeeding 
element. The above grid pattern is analogous to the following bindings in C: 1 = A [i-1] ; c = A [i] ; 
r = A [i+1] ; for an array A and index i, where i is the cursor index. 

Every grid pattern must contain exactly one sub-pattern prefixed by @ which is matched to the grid 
element indexed by the grid's cursor. The relative lexical position of the remaining patterns to the cursor 
pattern determines to which element in the grid each pattern should be matched. 

One-dimensional grid patterns can be nested to provide ?i-dimensional patterns. For example, a 
pattern match on a two-dimensional grid, of dimensionality Dim X : * Dim Y, can be defined: 



Y: I X: I It @lc lb I 
@ X: I ct @cc cb I 
X : I rt @ rc rb I I 



Here the outer grid pattern has exactly one pattern prefixed by @ and the inner grid patterns also have 
exactly one pattern marked by @. Grid patterns are prefixed by a dimension term to disambiguate 
the dimension being matched upon by a stencil. For syntactic convenience, Ypnos provides a two- 
dimensional grid pattern syntax which is considerably easier to read and write than nested patterns. In 
two-dimensional grid pattern syntax, the above pattern can be written: 



X*Y: I It Ic lb I 
I ct @ cc cb I 
I rt rc rb I 



The layout of the grid pattern syntax is essentially pictorial in its representation of a stencil's access 
pattern, where the spatial relationships of patterns match the spatial relationships of data. 

The above access pattern is known as a nine-point two-dimensional stencil. The Laplace example 
uses a five-point stencil where a wildcard pattern is used to ignore the corner elements. 

Grid patterns restrict the programmer to relative array-indexing, expressed as a static construction 
that requires no evaluation or reduction. Grid patterns thus provide decidable compile-time information 
about the access pattern of a program without the need for analysis further than parsing. Because in- 
dexing is static, the access pattern of a stencil function can be encoded as a type (discussion further in 
Section [3]). The fun macro allows a stencil fiinction to be defined by a grid pattern and, following the 
arrow symbol ->, a Haskell expression which defines the body of the stencil function. 

2.3 Applying Stencil Functions and Boundaries 

There are several functions for applying stencil functions to grids. The example uses runA which applies 
a stencil function at each position in the parameter grid by instantiating the internal cursor index of the 
grid to each index in the range (0, 0) to (w — 1 , h — 1) (inclusive), writing the results into a new grid. This 
range of indices is called the extent of the grid. 
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The grid pattern of laplace2D accesses indices (/ — 1, j), (/ + 1, j), (/, j — 1), and (/, j + 1) relative 
to the cursor index (/, j) thus application of the stencil function at indices around the edge of the grid, 
e.g. (0,0), results in out-of-bounds access. Errors or undefined behaviour are avoided by specification 
of boundary behaviour by laplaceBound, which is passed to the listGrid constructor. 

Boundary behaviour can be specified in a number of ways inside the boundary quasiquoted macro. 
In the example we defined the boundary behaviour by: 



[boundary I Double from (-1, -1) to (+1, +1) -> 0.0 |] 



which specifies a one-element deep boundary region, or halo, around a grid of element type Double 
with a default value of 0.0. The from ... to . . . syntax is itself short-hand for a more verbose 
description specifying smaller sub-regions of the boundary. The above is short-hand for the following: 



[boundary 1 Double (-1, 


-1) 


-> 








(*i, 


-1) 


-> 








(+1, 


-1) 


-> 








(-1, 


*j) 


-> 








(+1, 


*j) 


-> 








(-1, 


+1) 


-> 








(*i, 


+1) 


-> 








(+1, 


+1) 


-> 





|] 



Each case defines the value of a boundary region where the left-hand side of -> is a region descriptor. 
Figure[T]gives a pictorial representation of boundary regions, and their descriptors, for a two-dimensional 
grid with a one-element boundary region. Note, *v denotes a variable binding for a variable v. 

a= (-1, -1) e= (+1, *j) 

b= -1) /= (-1, +1) 

c= (+1, -1) g= (*i, +1) 

d= (-1, *j) h= (+1, +1) 



Figure 1: Boundary region descriptors for a two-dimensional grid with a one element-deep boundary 

The grammar of boundary terms is the following, where n are natural numbers greater than one, e are 
Haskell expressions, and v are variables: 

B ::= from Ii to I2 -> e \ I -> e \ I g -> e 

I :■= P \ iP, P) I iP, P, P) I iP, P, P, P) I ... 

P :\= -n \ +n \ *v 

A boundary definition consists of an explicit element type and a number of B terms which specify either 
a range of boundary regions with the same value for the elements of each region or a specific boundary 
region. The three forms are called the range form, the specific form, and the specific parameterised form. 
The terminal +n denotes a boundary region n elements after the upper extent of a grid dimension; -n 
denotes a boundary region n elements before the lower extent of a grid dimension; *v denotes any index 
inside the extent of a grid dimension where the value of the index is bound to v (see Figure [T]l. 
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As an example of a more complicated boundary definition, the following specifies the boundary of a 
two-dimensional grid where the "left" side of the grid (i.e. regions a, d, and / in Figure [TJ has default 
value 1.0 to a depth of one element, the "right" side (i.e. regions c, e, and h in Figure[T]l has default value 
2.0, one-element deep, and the remaining edges have default value 0.0, one-element deep: 



[boundary I Double from (-1, -1) to (-1, +1) -> 1 . 
from (+1, -1) to (+1, +1) -> 2.0 
(*i, -1) -> 0.0 
(*i, +1) -> 0.0 |] 



More complex boundary behaviour such as reflecting or wrapping boundaries can be described by the 
specific parameterised form which is the same as the specific form with an additional parameter for the 
grid for which the boundary is being defined. As an example of the specific parameterised form, the 
following defines the boundary of a two-dimensional grid where the "top" edge reflects the values inside 
the grid to a depth of one element, the "left" and "right" edges are wrapped, the "bottom" edge has a 
constant value up to a depth of two elements, and the corner cases, (—1,-1) and (+1,-1), reflect the 
corresponding comer elements inside the grid: 



[boundary 1 Double (*i, 


-1) g -> 


g! 


!(i, 0) 






— top 


(-1, 


*j) g -> 


g! 


! (f St (size 


g) - 


1. j) 


— left 


(+1. 


*j) g -> 


g! 


!(0, j) 






— right 


from 


(-1, +1) 


to 


(+1, +2) -> 


0.0 




— bottom 


(-1, 


-1) g -> 


g! 


!(0, 0) 






— top corners 


(+1, 


-1) g -> 


g! 


! (f St (size 


g) - 


1, 0) |] 





The (!!!) operator is a safe general indexing operation (i.e. out-of-bounds access raises an exception) 
that is only available within a boundary expression. Bounds checks for ( ! ! ! ) do not have a significant 
effect on program performance as such a boundary is a relatively small percentage of the overall grid 
size and is only calculated once per application of a stencil function to update the boundary values from 
the updated grid values. Note, a boundary without any parameterised regions is constructed just once 
overall as its values do not, and cannot, depend on a grid's inner values. 

The boundary definition of a grid must provide sufficient cover for a stencil function such that the 
stencil function does not access an element which is outside of the grid or boundary region. Since grid 
patterns and boundary definitions are static this safety property can be checked statically. Any program 
without appropriate boundaries for a stencil apphcation is rejected. 



3 Statically Enforcing Safe Indexing via Types 

Ypnos provides efficiency and correctness guarantees by enforcing safe indexing statically via typing. 
Thus, well-typed Ypnos programs are safe, and bounds-check free. 

Type-level information in Ypnos is constructed, composed, and manipulated using various type- 
system features of Haskell and the Glasgow Haskell Compiler. Generalised algebraic data types (GADTs) 
lfT9l [TSll are used as a form of lightweight dependent-typing, propagating information from data-level to 
type-level via the constructors of a data type. Type families |4| are used as simple type-level functions 
to manipulate/compose type-level information. Type classes |7| are used as predicates and relations on 
types. Appendix [Bjprovides a brief introduction to these type-system features for the unfamiliar reader. 
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We introduce here the core aspects of the Ypnos implementation that are most relevant to enforcing 
safe indexing. This section also explains, mostly by example, the desugaring of Ypnos syntax into 
Haskell. We begin with an informal high-level overview of how safety is enforced before delving into 
details and definitions. So far, Ypnos programs have been typeset in monospace to present a clear view 
of the concrete syntax. The Haskell code shown here is typeset more elaborately to ease reading. 



3.1 Safety Overview 

The key to enforcing safety in Ypnos is the encoding of relative indices at the type-level. Type-level 
relative indices are used in the types of stencil functions and grids in two different ways: respectively, to 
encode the relative indexing pattern of a stencil function and to encode the relative position of boundary 
regions with respect to the edge of a grid. 

The type of a grid is parameterised by type-level information about the grid's boundary regions. The 
grid data type takes four type parameters: 

Grid d b dyn a 



where <i is a dimensionality type, is a type-level list of the relative indices (with respect to the grid's 
edge) of the grid's boundary regions, and a is the element type of the grid {dyn will be discussed later). 

Stencil functions include in their type the relative indices accessed by the function, as described by 
a grid pattern. Grid patterns are desugared via the fun macro into relative indexing operations on a grid, 
with type approximately of the form: 

index : : Safe ib ^ i ^ Grid d b dyn a 



where index takes a relative index of type / and a grid of element type a, returning a single value of type 
a. The Safe i b constraint enforces safety by requiring that there are sufficient boundary regions described 
by b such that a relative index / accessed from anywhere within the grid's extent has a defined value. 

The boundary information that parameterises a grid type (type parameter b above) is generated from a 
boundary description used in constructing a grid. The boundary macro desugars a boundary description 
into a special data structure which some grid constructors take as a parameter. 

The rest of this section is structured as follows: Section 3.2 introduces dimensionality types, which 
determine the index type of a grid. Section 3.3 introduces absolute index types and the important relative- 
index type representation. Section |3.4| discusses boundary definitions and boundary types, which leads 



into Section 3.5 on grid constructors where boundary information propagates to grid types. Section 3.6 



brings together the different type-level information, showing how Safe is defined to match relative indices 
to grid boundaries. Section [3^ completes by showing application of stencil functions to grids. 



3.2 Dimensions 

As described above, the Grid data type has a type parameter representing the dimensionality of a grid. In 
the example of Section[2j one of the parameters to the constructor listGrid was a dimensionality term: 
Dim X : * Dim Y specifying that the grid is two-dimensional with named dimensions X and Y. 

Dimension terms are defined by the following GADT, with a constructor Dim for single dimensions 
and a constructor :* for the tensor of a single dimension and a dimension tem|^ 

'This tensor operator is thieoretically associative but tiie implementation is simplified by defining a right-associating operator. 
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data Dim d 

AsAsi{:*)dd' 

data Dimensionality d where 

Dim : : Dimldentifier d ^ d ^ Dimensionality [Dim d) 

( :* ) "Dimensionality [Dim d) — )• Dimensionality d' — )• Dimensionality [Dim d:* d') 

In the first two lines, tlie data types Dim and ( :* ) are empty declarations defining type constructors 
witliout data constructors. Tlie data constructors of Dimensionality reuse tliese name^ and use tliese 
empty types in the result type's parameter as type representatives for the data constructors. The types 
of dimensionality terms are singleton types, i.e. types with just one inhabitant, thus their type uniquely 
determines their value. For example, the term Dim X :* Dim Y has type Dimensionality (Dim X :* Dim Y). 

The Dimldentifier class (which constraints the Dim constructor) is that of valid dimension identifier 
types. Valid dimension identifiers are declared in Ypnos by a [dimension I ... |] macro, which 
expands a list of dimension identifiers into singleton data types, representing dimension identifiers, and 
instances of Dimldentifier e.g. 



[dimension I X, Y |] 



3.3 Absolute and Relative Indices 



dataX = X 
data Y=Y 

instance Dimldentifier X 
instance Dimldentifier Y 



Internally, grids are indexed by tuples of Int values. Given the dimensionality of a grid the type of 
absolute indices is calculated by the Index type family with instances (of which we show the first three): 

type family Index t 

type instance Index {Dim x) = Int 

type instance Index ( {Dim x) :* {Dim y) ) = {Int, Int) 

type instance Index {{Dim x) :* {{Dim y) :* {Dim z))) = {Int, Int, Int) 

Unfortunately, tuples in Haskell are not inductively defined thus a fully general implementation of Ypnos 
must have an instance of Index for every possible arity of dimensionality i.e. an infinite number. In prac- 
tice, many stencil computations do not require more than four dimensions. We discuss this issue further 
in Section|6]and in the rest of the paper give tuple-related definitions up to two or three dimensions. 

As mentioned, relative indices have a type-level encoding which is key to enforcing safety. Relative 
indices are given by tuples of a data type encoding of integers, IntT, providing a type-level representation 
of integers. IntT is defined in terms of a data type of inductively defined natural numbers, Nat, for 
which the result type of each data constructor provides a type-level representation of the natural number 
constructed, as defined by the following GADT: 

dataZ 
data S n 

data Nat n where 

Z-.-.NatZ 

Sv.Natn^Nat {S n) 



■^The data constructors could have been given different names to their type representatives, e.g. Dimldent d ~ 
Dimensionality [Dim d), but we find the correspondence between data constructors and type representatives more instructive. 
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Integers are defined by IntT in terms of Nat as eitfier a negative or positive natural number: 

data Neg n 
data Pos n 

data IntT n where 

Negy.Nat {S n) IntT {Neg {S n)) 
Pos -.-.Nat n IntT {Pos n) 

Note that zero has a unique representation: Pos Z. The expression Neg Z is not well-typed as the Neg 
data constructor expects a natural number of at least one i.e. of type Nat {S n). 

The Grid data type is parameterised by a type-level list of the relative positions of boundary regions. 
An example type-level list construction is the following GADT of heterogeneous lists: 

dataMZ 

data Cons x xs 

data List t where 
Nil ■.-.List Nil 

Cons ::x^ List xs — )• List {Cons x xs) 

The List type thus encodes the types of its elements in its type parameter f as a type-level list e.g. 

{Cons 1 {Cons "hello" {Cons 'a' Nil))) :: {List {Cons Int {Cons String {Cons Char Nil)))) 

The type-level representation of a grid's boundary regions uses the same type-level list representation, 
but the list contains relative indices describing the position of boundary regions relative to the grid's edge. 
For example, a one-dimensional grid with boundary regions of -1 and +1 has a type like: 

Grid {Dimd) {Cons {IntT {Neg {S Z))) {Cons {IntT {Pos {S Z))) Nil)) dyna 



3.4 Boundaries 



A relative index cannot be accessed safely from everywhere within a grid's extent unless the grid has 
appropriate boundary values, provided by a boundary definition. Ypnos boundary definitions are desug- 
ared by the boundary macro into a data structure describing the boundary which encodes in its type the 
positions of the boundary elements defined, relative to the edge of the grid. This type-level boundary 
information is propagated to a grid's type (the b type parameter in the Grid type). The Safe constraint in 
the type of relative indexing operations uses this boundary information to ascertain if a grid has adequate 
boundaries for the relative index to be safe when accessed from anywhere within the grid's extent. 



Section 2.3 introduced Ypnos's syntax for describing boundaries, with the grammar: 



B 
I 
P 



from Ii to I2 -> e \ I -> e \ I g -> e 

P I iP, P) I iP, P, P) I iP, P, P, P) 

-n \ +n \ *v 



The syntax, I g -> e, describes boundary values that are dependent on the contents of a grid g. Such 
a boundary is described as dynamic. After the application of a stencil function to a grid with a dynamic 
boundary, the boundary values must be recomputed so that its values are consistent with the grid's (pos- 
sibly changed) inner elements. Conversely, a non-parameterised boundary description is static, since it 
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does not depend on the values of the grid and thus does not need recomputing. Since the from . . . to . . . 
syntax is desugared into a number of specific forms ...->..., as described in Section [23] boundary 
definitions consist of a number of B terms of either the second or third syntactic form. 

Boundary definitions are desugared by the boundary macro into a BoundaryList structure which 
comprises a list of BoundaryFun values. Each B term in a boundary definition is desugared into a 
function mapping one or more boundary indices, determined by the region descriptor, to a value. From 
such functions a BoundaryFun value is constructed which encodes information in its type about the 
boundary region it defines. BoundaryFun has two constructors for static and dynamic regions: 



data Static 
data Dynamic 

data BoundaryFun d ix a dyn where 

Static ::{ix^a) 

Dynamic : : ( (/x, Grid d Nil Static a) 



— )• BoundaryFun d ix a Static 
a) —7- BoundaryFun d ix a Dynamic 



Both the Static and Dynamic constructors of BoundaryFun have a single parameter: a function mapping 
one or more boundary indices of type ix to a value of type a. In the case of a Dynamic boundary function, 
this index is paired with a grid of element type a. BoundaryFun has four type parameters: d the grid's 
dimensionality, ix the boundary indices type, a the element type, and dyn denoting whether the boundary 
is static or dynamic. 

The region descriptor of a boundary (non-terminal / in the grammar) determines the type of boundary 
indices. The components of a boundary index|^ are either relative or absolute: -n and +n are relative 
indices, relative to the lower or upper bound respectively of the dimension's extent, and *v is an absolute 
index (i.e. an Int value) within the the dimension's extent. The region descriptor syntax is desugared into 
pattern matches on relative/absolute indices in the following way: 



Syntax 



-n 
+n 
*v 



Pattern 



Neg M 
Pos M 
v 



Type 



IntT {NegM) 
lntT{Pos Inf) 
Int 



where [O]] Z 

s\n-n 

at both the data- and type-level. 



Figure [2] provides an example of the types of the boundary indices for the boundary regions of a 
two-dimensional grid with a one-element deep boundary. The following shows the desugaring of two 
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a = {IntT {Neg {SZ)), IntT {Neg {S Z))) e= {IntT {Pos {SZ)), Int) 

b = {Int, IntT {Neg {S Z))) f= {IntT {Neg {SZ)), IntT {Pos {S Z))) 

c = {IntT {Pos {SZ)), IntT {Neg {S Z))) g = {Int, IntT {Pos {S Z))) 

d = {IntT {Neg {SZ)),Int) h = {IntT {Pos {SZ)), IntT {Pos {S Z))) 



Figure 2: Boundary indices types for a two-dimensional grid with one-element boundary 



The components of an index are the individual elements of the index's tuple for each dimension. 
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example boundary definitions into BoundaryFun values, with explicit type signatures: 
[(-1, +1) -> o.ol 

•w {Static {XiNeg {SZ),Pos {S Z)) ^ 0.0)) 

:: {BoundaryFun {Dim d :* Dim d') {IntT {Neg {S Z)),IntT {Pos {S Z))) Double Static) 
l(*±, +2) g -> g! ! ! (i, 1)1 
^ {Dynamic {?i{{i,Pos {S {S Z))),g) ^ g \U {i,\))) 

:: {BoundaryFun {Dim d :* Dim d') {Int,IntT {Pos {S {S Z)))) Double Dynamic) 

Functions with GADT parameters require explicit type signatures 1 181 thus the boundary macro gener- 
ates such type signatures for the boundary functions it defines, of which we showed two examples above. 
The boundary index types and the type for static or dynamic boundaries can be easily generated from the 
syntax of a boundary definition during desugaring. However, the element type of a grid/boundary cannot 
be constructed without performing type inference. To simplify the implementation the user must provide 
a monomorphic type for the boundary values at the start of a boundary definition, as seen in the Laplace 
example where the Double type is specified. 

BoundaryFun values are composed via the BoundaryList structure which is similar to the List GADT 
seen earlier but with additional type-level information, defined: 

data BoundaryList b dyn lower upper d a where 

NilB :: BoundaryList Nil Static {Origin d) {Origin d) d a 
ConsB '.'.BoundaryFun d ix a dyn 

—7- BoundaryList b dyn' lower upper d a 
— >■ BoundaryList {Cons {AbsToRel ix) b) {Dynamism dyn dyn') 
{Lower ix lower) {Upper ix upper) d a 

The BoundaryList structure has six type-parameters encoding inductively-defined information about 
the boundary regions via the NilB constructor for the base case and ConsB constructor for the inductive 
case. The first type-parameter is a type-level list, defined using the Nil and Cons type constructors, 
which encodes the boundary indices defined by the BoundaryList. In the inductive case, the AbsToRel 
type family is applied to the boundary indices type ix of the BoundaryFun. AbsToRel converts a boundary 
indices type into relative index type. Boundary indices have components that are either absolute or 
relative indices. AbsToRel converts absolute indices or type Int, which are indices within the extent of a 
dimension, to zero-relative indices as an index within the extent of a dimension is neither greater than or 
less than the extent of a dimension. AbsToRel is defined: 

type family AbsToRel t 

type instance AbsToRel Int = IntT {Pos Z) - Int to zero {IntT {Pos Z)) 

type instance AbsToRel {IntT n) = IntT n — IntT to IntT 

type instance AbsToRel {a,b) = {AbsToRel a, AbsToRel b) — Two-dimensional case 

The type-level list of boundary indices is thus a list of relative indices with respect to the grid's lower 
or upper extent, where a zero-relative index component means any index within the grid's extent. This 
representation is useful for matching relative indices from indexing operations with the boundary infor- 



mation of a grid, to which we return in Section 3.6 1 



The remaining parameters to BoundaryList encode other information which is used by the implemen- 
tation but is not involved in enforcing safety. Briefly, dyn describes whether all boundary sub-definitions 
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are static or if at least one is dynamic. The base case is that all are Static. The inductive case applies the 
Dynamism type family, which is monotonic (in the two element domain {Static, Dynamic}), calculating 
the dynamic property for the entire boundary: 

type family Dynamism 1 1' 

type instance Dynamism Static Static = Static 

type instance Dynamism Static Dynamic = Dynamic 

type instance Dynamism Dynamic Static = Dynamic 

type instance Dynamism Dynamic Dynamic = Dynamic 

The lower and upper parameters encode the lower and upper bounds of the boundary regions, relative 
to the lower and upper bounds of the grid's extent, which is used by the implementation to construct the 
data array for a grid. The base case for both is the Origin for a particular dimensionality, which is the 
zero-relative index e.g. Origin {{Dim d) :* {Dim d')) = {IntT {Pos Z),IntT {Pos Z)). The inductive cases 
are calculated by the Lower and Upper type families which compute the lower and upper bound of a pair 
of boundary indices, defined as the minimum and maximum of each component of an index. 

Lastly, the d parameter of BoundaryList is the dimensionality of the grid for which this is the bound- 
ary, and a is the element type of the grid/boundary. 

In the Laplace example of Section|2j the type of the boundary definition, laplaceBoundary, is: 

BoundaryList 



{Cons {IntT {Neg {SZ)),IntT {Neg {S Z))) 


- (-1, 


-1) 


{Cons {IntT {Neg {SZ)),IntT {Pos Z)) 


-- (-1, 


*j) 


{Cons {IntT {Neg {SZ)),IntT {Pos {S Z))) 


-- (-1, 


+ 1) 


{Cons {IntT {Pos Z) , IntT {Neg {S Z))) 


- (*i, 


-1) 


{Cons {IntT {Pos Z), IntT {Pos {S Z))) 


- (*i, 


+ 1) 


{Cons {IntT {Pos {SZ)), IntT {Neg {SZ))) 


- (+1, 


-1) 


{Cons {IntT {Pos {SZ)), IntT {Pos Z)) 


- (+1, 


*j) 


{Cons {IntT {Pos {SZ)), IntT {Pos {S Z))) 


- (+1, 


+ 1) 



Nil)))))))) Static 

IntT {Neg {SZ)),IntT {Neg {S Z))) {IntT {Pos {SZ)),IntT {Pos {SZ))) 
Dim X :* Dim Y) Double 

Fortunately such a type is never constructed by an end-user, although it is possible for such a type to be 
seen as an error message (see Section [6] for further discussion). Note that for Dynamism, Lower, and 
Upper a corresponding value is not computed, thus these computations occur only at compile-time. 

In the original paper there was some mention of a boundary structure called a facets structure, this 
corresponds to the BoundaryList structure which has been elaborated here following recent research. 



3.5 Grids and Grid Constructors 

A BoundaryList structure can be passed to grid constructor to define the boundary of a grid. The first 
type parameter of a BoundaryList structure provides type-level information of the boundary it defines 
as a list of relative indices of the boundary regions. A grid constructed with a BoundaryList adopts this 
type-level boundary information in its type. 

The Grid data type has four type parameters: Grid d b dyn a, where J is a dimensionality type, b is 
the boundary information, dyn describes whether the grid's boundary is static or dynamic, and a is the 
element type of the grid. Grid is defined by the following GADT with just one constructor: 
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data Grid d b dyn a where 
Grid .: {1 Array U Array a) 

=^ {UArray {Index d) a) — Array of values 

Dimensionality d — Dimensionality term 

— )• Index d — Cursor ("current index") 

-> {Index d, Index d) — Lower and upper bounds of grid extent 

— BoundaryList b dyn lower upper da — Boundary definition 
— > Grid d b dyn a 

The type parameters b and dyn of BoundaryList, which encode boundary regions and dynamism 
information, are transferred to the constructed Grid. The lArray UArray a constraint is a consequence of 
the underlying implementation of grids as Haskell unboxed array data types. 

The Grid data constructor is hidden from the end-user. Instead a number of different constructor 
functions are provided, the most common of which are listGrid (seen earher) and grid, which have type: 

grid : : {lArray UArray a, Reifiable upper {Index d) , Reifiable lower {Index d)) ^ 

Dimensionality d — > Index d — t- Index d — t- [ {Index d,a)] 
BoundaryList b dyn lower upper d a ^ Grid d b dyn a 

listGrid:: {lArray UArray a, Reifiable upper {Index d), Reifiable lower {Index d)) =^ 
Dimensionality d — > Index d — > Index d [a] — )• 
BoundaryList b dyn lower upper da—^ Grid d b dyn a 

The first three parameters of both constructors are a dimensionality term, a lower-bound index, and an 
upper-bound index. The fourth parameter for listGrid is a list of elements which define the data of the 
grid, and for grid a list of index-element pairs. The fifth parameter of both is a boundary definition. 

Recall that relative indices are singleton types, thus the type uniquely determines the value. The 
Reifiable class provides functions for constructing values from relative index types. In this case, the 
upper and lower bounds of a boundary can be reified as absolute indices. Reifiable is used elsewhere in 
the implementation to construct relative index values, or /«f-valued relative indices, from a type. 

Ypnos provides two further constructors listGridNoBoundary and gridNoBoundary, not shown here. 
Both are similar to listGrid and grid but are not parameterised by a BoundaryList structure, returning a 
grid of type Grid d Nil Static a, where Nil boundary information implies no boundary. 



3.6 Grid Patterns, Indexing, and Safety 

Grid patterns are desugared into a number of relative indexing operations which are internally imple- 
mented without bounds-checks. The one- and two-dimension relative indexing operations have type: 

indexID :: Safe {IntT n) b ^ 

IntT n — > Int — > Grid {Dim d) b dyn a—^ a 

indexlD :: Safe {IntT n,IntT n') b => 

{IntT n,IntT n') — > {Int, Int) — > Grid {Dim d:* Dim d')b dyn a—^ a 

Note that, in both cases, the first parameter is a relative index and the second parameter is an absolute 
index. The fun macro generates at compile time both an inductively defined relative index, to produce 
the correct type constraints, and an /n?-valued relative index which is used to perform the actual access. 
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This second parameter is provided so that an inductive relative index does not have to be (expensively) 
converted into an Int representation for actual indexing at run-time. 

The Safe constraint enforces safety by requiring that a grid's boundary provides sufficient elements 
outside of a grid such that a relative index has a defined value if accessed from anywhere within the 
grid. As a first approximation Safe is essentially defined as membership of a relative index to the list of 
boundary regions for a grid, which are the relative indices of boundary regions from the edge of the grid. 

The InBoundary class is used by Safe as a predicate to test whether a relative index matches a relative 
position of a boundary region in the list of boundary regions: 

class InBoundary i ixs 

instance InBoundary i {Cons i ixs) — List head matches 

instance InBoundary i ixs =^ InBoundary i {Cons i' ixs) — List head does not match, recurse 

For example, consider a two-dimensional grid with a one-element deep boundary: 
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The relative index (0, +1), represented by type {IntT {Pos Z),IntT {Pos {S Z))), requires the g boundary 
to be safe. The boundary function for g pattern matches on boundary indices of type {Int, IntT {Pos {SZ))). 
As AbsToRel maps Int to a zero-relative index in a boundary indices type (see Section [34| ), the relative 
position of g is encoded in the type-level boundary information as: {IntT {Pos Z),IntT {Pos {S Z))) i.e. 
the same representation as the relative index (0, + 1). Thus, our first approximation of safety as member- 
ship of a relative index to the list of boundary regions would in this situation be sound. However, other 
relative indices require several boundary regions to be defined. For example, safety of the relative index 
(+1, +1) requires the boundaries g, h, and e. 

Safe is implemented in the following way: for every relative index we must test the space of possible 
indices it may access outside of a grid boundary. For the one-dimensional case Safe is defined as such: 

class Safe i b 

instance Safe {IntT {Pos Z))b 

instance {Safe {IntT {Pred n)) b, InBoundary {IntT n) b) =^ Safe {IntT n) b 

The first instance defines that zero-relative indices are always safe. The second instance defines that a 
relative index is safe if it is a member of the boundary regions for a grid and if its predecessor is also 
safe. Pred is defined: 



type family Pred n 

type instance Pred {Neg {S {S n))) = Neg {S n) - Pred —{n + \) = —n 

type instance Pred {Neg {S Z)) = Pos Z - Pred — 1 =0 

type instance Pred {Pos Z) = Pos Z - Pred =0 

type instance Pred {Pos {S n)) = Pos n — Pred +(« + 1) = +n 

Thus, for a relative index of say (—2) there must be a boundary region for (—2) and (—1), and for a 
relative index of (+2) there must be a boundary region (+2) and (+1). Pred is thus the predecessor of 
a relative index, approaching zero from both the negative and positive directions. 
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For the two-dimensional case Safe is defined similarly: 
instance Safe {IntT {Pos Z),IntT (Pos Z)) b 

instance {Safe {IntT {Pred n),IntT n') b,Safe {IntT n, IntT {Pred n')) b, 
InBoundary {IntT n,IntT n') b) =^ Safe {IntT n,IntT n') b 

Again, the first instance states that a zero-relative index is safe. The second instance states that a relative 
index is safe if it is a member of the boundary regions for a grid and if its predecessors in both dimensions 
are safe. Appendix |C]provides a proof-sketch of the soundness of enforcing safe indexing via Safe. 
The Laplace operator of the introductory example is desugared by the fun macro into the code shown 



in Figure 3(a) with the type shown in Figure 3(b) The Safe type constraints of laplace thus encode its 
relative access pattern and constrain the grid's boundary regions such that the grid at least satisfies the 
relative indices of the grid pattern. 

laplace g= {indexID {Neg {SZ),PosZ) (-1,0) g) {Safe {IntT {Neg {SZ)),IntT {Pos Z)) b, 
+ {indexlD {Pos {SZ),PosZ) (1,0) g) Safe {IntT {Pos {SZ)),IntT {Pos Z)) b, 
+ {indexlD {Pos Z,Pos {S Z)) {0,1) g) Safe {IntT {Pos Z) , IntT {Pos {SZ)))b, 
+ {indexlD {Pos Z,Neg {S Z)) {0,-\) g) Safe {IntT {Pos Z) , IntT {Neg {SZ)))b, 
- 4 * {indexlD {Pos Z, Pos Z) {0,0) g) Safe {IntT {Pos Z) , IntT {Pos Z))b, Num a) 

=^ Grid {Dim d:* Dim d')b dyn a ^ a 

(a) Code (b) Type 

Figure 3: Desugared Laplace example 



3.7 Applying Stencil Functions to Grids 

In the Laplace example, the laplace stencil function is applied to a grid using runA. The runA function 
is overloaded on the type parameter of a grid which encodes whether the grid's boundary is static or 
dynamic, and is defined by the type class: 

class RunGridA dyn where 

runA : : {Grid d b dyn a ^ a) ^ Grid d b dyn a — t- Grid d b dyn a 

The instance for Dynamic applies the stencil function to a grid and then, from the BoundaryList of the 
parameter grid, recomputes the boundary elements. The instance for Static applies the stencil function 
but preserves the boundary elements as they do not require recomputation from the updated grid values. 

A non-overloaded operation, called run, is also provided which applies stencil functions which 
change the element type of a grid, with type: 

run :: {lArray UArray y) =^ {Grid d b dyn x — j) — )■ Grid d b dyn x — )■ Grid d Nil Static y 

The element type of the parameter grid differs from the element type of the return grid therefore the 
boundary information of the parameter grid must be discarded as the boundary elements are of type x but 
the grid elements are now of type y. The boundary information on the return grid is thus empty i.e. Nil. 

As a theoretical aside: Ypnos grid's are comonadic structures, where run and runA provide the 
extension operation. The structuring of Ypnos by comonads was discussed in our earlier paper ifTTl . 
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4 Results & Analysis 

We provide a quantitative comparison of Ypnos versus Haskell with two benchmark programs of the two- 
dimensional Laplace operator and a Laplacian of Gaussian operator, which has a larger access pattern. 
All experiments were performed on an Intel Core 2 Duo 2Ghz, with 2GB DDR3 memory, under Mac 

05 X version 10.5.8 using GHC 7.0.l|^ All code was compiled using the -02 flag for the highest level 
of "non-dangerous" (i.e. no-worse performance) optimisations. The Ypnos library is imported into a 
program via an #include statement, allowing whole-program compilation for better optimisation. 

All experiments are performed on a grid of size 512x512. The execution time of the whole program 
is measured for one iteration of the stencil computation and for 101 iterations of the stencil computation. 
The mean time per iteration is computed by the difference of these two results divided by a 100. The 
mean of ten runs is taken with all results given to four significant figures. 



Laplace operator This benchmark uses the Laplace example of Section [2] 





Haskell 


Ypnos 


1 iteration (s) 


3.957 


5.179 


101 iterations (s) 


6.297 


7.640 


Mean time per iteration (s) 


0.0234 


0.0246 



Ratio of mean time per iteration for Ypnos over Haskell = 



(7.640-5.179)/100 0.0246 



1.051 



(6.297 -3.957)/100 0.0234 
i.e. the Ypnos implementation is approximately 5% slower per iteration than the Haskell implementation. 



Laplacian of Gaussian The Laplacian of Gaussian operation combines Laplace and Gaussian convo- 
lutions. We use here a 5 x 5 Laplacian of Gaussian convolution operator with coefficients tSJ : 
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Haskell 


Ypnos 


1 iteration (s) 


3.962 


5.195 


101 iterations (s) 


7.521 


8.662 


Mean time per iteration (s) 


0.0356 


0.0347 



Ratio of mean time per iteration for Ypnos over Haskell : 



0.0347 



■ 0.975 



0.0356 

i.e. the Ypnos implementation is roughly 3% faster per iteration than the Haskell implementation. 



Discussion In terms of whole program execution time, Ypnos performance is worse than Haskell, 
mostly due to the overhead of the general grid and boundary types and the cost of constructing boundary 
elements from boundary definitions. However, per iteration we see that Ypnos performance is com- 
parable to that of Haskell. In the Laplace example, performance is slightly worse, where the benefits 
of bounds-check elimination do not offset any overhead incurred by the Ypnos infrastructure. Given 



" ^http : / /www ■ haskell . org /ghc/ download._ghc_7_0_ 1 
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that safe indexing has an overhead of roughly 20 — 25% over unsafe indexing, for the two-dimensional 
Laplace operator on a 512 x 512 grid (see Appendix [A]), then the overhead of Ypnos in the Laplace exper- 
iment is roughly 25 — 30% per iteration. For the Laplace of Gaussians example performance is however 
slightly better (roughly 3%). Given intensive data access the benefits of bounds-check elimination begin 
to overshadow any overheads from Ypnos. Given a larger program with more stencil computations we 
conjecture that performance of Ypnos could considerably surpass that of Haskell's. 

Whilst Ypnos incurs some overhead it provides the additional benefits of static correctness guaran- 
tees, the comparable ease of writing stencil computations compared to Haskell or some other language, 
and the generality of extending easily to higher-dimensional problems. We are currently in the process of 
trying larger benchmarks in Ypnos, such as a Navier-Stokes fluid simulator, and implementing parallel 
implementations of the stencil-function application operations. 

5 Related Work 

The benefit to performance from eliminating or reducing bounds checking is well known, first appearing 
in lUS*] where various optimisations were used to reduce the cost of bounds checking in loops. There 
has been research on analysis and transformation techniques for eliminating bounds checks in various 
languages, such as in Java |l3l|26l. The idea of using types to ensure safety of array operations thus 
allowing array bounds checking to be eliminated is also not new. In Xi's thesis. Dependent Types in 
Practical Programming, dependently-typed array operations enforce safety of array indexing to allow 
bounds-check elimination in a general setting [27 1. The approach in Ypnos is similar, but using the more 
lightweight approach of GADTs. Recent work by Swierstra and Altenkirch has modelled distributed 
array access in the dependently-typed language Agda Il25l with safe indexing. Sheard et. al provide a 
good discussion of the relation between GADTs and dependent-types |23|. 

The approach of using GADTs, type families, and classes to encode invariants of a DSL is not 
novel. For example, Guillemette and Monnier implement a compiler for System F inside Haskell using 
GADTs for a typed-representation of System F terms 161 . Various transformations on the typed-terms 
are mechanised via type families, such as substitution and CPS conversion. Our approach is similar, 
although we do not use GADTs to construct typed syntax trees, instead our embedding is shallow, where 
Ypnos terms are either Haskell expressions or are desugared into Haskell expressions, in the case of 
grid patterns and boundary definitions. Sackman and Eisenbach use GADTs along with type classes to 
encode and enforce invariants of DSLs at the type-level tlTl . 

In 111], type classes and regular ADTs are used to encode type-level natural numbers and lists, 
instead of GADTs. Ypnos could have used a similar scheme but we found GADTs to be more concise, 
clearer, and easier to write functions over. The class/ADT approach requires any function on the data 
types to be written via a type class with each pattern matching case encoded as a class instance. 

We know of two approaches to stencil programming in Haskell that are closely related to Ypnos: 
PASTHA [ 13] and Repa [ 10. 14|. 

Repa provides a library of higher-order functions for programming parallel regular, multi-dimensional 
arrays in Haskell |[T0|| . Repa allows shape polymorphic array operations via a type-level representation of 
shape, or dimensionality, that is somewhat similar to the dimensionality types of Ypnos, although Repa 
does not name its dimensions (the reason for named dimensions in Ypnos is discussed in Section [6]l. 

Latest work on Repa [14] provides a data type for describing stencils in terms of stencil size and a 
function from indices within the stencil's range to values. The current implementation allows a constant 
or wrapped boundary to be specified which permits a bounds-check free implementation of a stencil 
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application function. Ypnos allows more fine-grained boundary behaviour than the current Repa imple- 
mentation via its flexible boundary definitions and type-level safety encoding. However, it seems likely 
Repa will be extended with further boundary options in the future. Whilst the Repa implementation is 
currently specialised to two-dimensional arrays and stencils, Ypnos supports higher-dimensionality sten- 
cils via the grid pattern syntax, which is also easy to read and write. In the future, Repa may prove a 
useful implementation platform for parallel Ypnos programs due to Repa's very good parallel support. 

PASTHA is a similar library for parallelising stencil computations in Haskell [13|. Stencils in 
PASTHA are described via a data structure and are restricted to relative indexing in two spatial di- 
mensions but can also index the elements of previous iterations. The approach of Ypnos is much more 
general and provides more static guarantees about correctness. 

There are many other languages designed for fast and parallel array programming such as Chapel 
II2I . Titanium ll30l . and Single Assignment C (SAC) ll22l . Ypnos differs from these in that it is suffi- 
ciently restricted to a problem domain, and sufficiently expressive within that problem domain, that all 
information required for guaranteeing safety, optimisation, and parallelisation is decidable and available 
statically without the need for complex compiler analysis and transformation. 

Titanium, a parallel dialect of Java developed for over a decade, provides support for stencil computa- 
tions via array loop constructs (f oreach) and operations on indices (such as translations) ll30ll . Bounds- 
checking can be eliminated in some cases via compiler analyses and transformation. Any checks not 
eliminated provide runtime out-of-bounds errors. A special compilation mode allows bounds-checks to 
be removed entirely |30|, which, whilst a useful software engineering technique for production code, 
does not improve the robustness or inherent efficiency of the language. 

There has been much work on the optimisation of stencil computations particularly in the context of 
optimising cache performance (see for example [9]) and in the context of parallel implementations ||5l 
[T2]| . Ypnos does not currently use any such optimisation techniques but could certainly be improved by 
the use of more sophisticated implementations of the stencil application functions. Various optimisation 
techniques such a loop tiling, skewing, or specialised data layouts, could be tuned from the symbolic 
information about access patterns statically provided by grid patterns. 

6 Further Work 

The first Ypnos paper lITTl discussed Ypnos 's implementation agnosticism, where the implementation of 
the language constructs is parameterisable, allowing implementations tailored to different architectures 
and execution strategies. In this paper we did not discuss back-end parameterisability, but instead used 
a single implementation for unboxed arrays. Currently we are in the process of generalising the typed 
approach of this paper to a parameterisable back-end. We discuss briefly here other areas of further work. 

Slices One of the main reasons that Ypnos was designed with named dimensions, e.g. X, Y, etc. was 
to ease multi-dimensional programming. Named dimensions are akin to using records to provide named 
rather than positional parameters to a function or procedure in a language. 

Further work is to add array slicing operations to Ypnos, for which the named dimension are partic- 
ularly useful. Slicing can be added to Ypnos by varying the behaviour of grid patterns depending on the 
dimensionality of the grid matched upon. For example, consider the hypothetical type of the following 
function which uses a one-dimensional grid pattern on the X dimension: 

foo : : Grid (X : * d) b dyn a -> (Grid d b dyn a, Grid d b dyn a. Grid d b dyn a) 
foo = [fun I X: I 1 @c r I -> (1, c, r) |] 
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Applying f oo to a grid of type Grid {Dim X:* Dim Y) b dyn a would yield a triple of successive grid 
slices in the Y dimension; applying f oo to a one-dimensional grid of type GridX b dyn a would yield a 
triple of single a values, requiring a unit for the dimension tensor e.g. NoDim such that: 

NoDim :* DimX = DimX = DimX :* NoDim 

Additionally, Grid NoDim b dyn a = a. Named dimensions allow f oo to be applied to a grid of the type 
Grid {Dim Y:* DimX) b dyn a yielding the same slicing on the X dimension. 

Grid slicing eases higher-dimensional programming as grid patterns need not match all dimensions at 
once, but could slice a grid into lower-dimensional subgrids to be operated on by further stencil functions. 

Inductive Tuples Ypnos is not fully general in the dimensionality of its grids due to the non-inductive 
nature of Haskell tuple types. The implementation specifies operations usually up to four dimensions, 
however it is conceivable that some applications might require higher-dimensionality. 

Inductively defined tuples, essentially lists, could be used as indices to provide a fully general imple- 
mentation, however such a scheme is inefficient. In preliminary experiments we found that indexing with 
even just a two-element list took roughly four times as long per iteration than indexing with a tuple of 
two elements for a two-dimensional Laplace operation ona512x512 grid (see Appendix [A| for details). 

An ideal system would use an inductively defined implementation with a compiler transformation 
mapping inductive lists to fixed-size tuple types in the compiled Haskell code. 

Errors A well-known problem with building embedded DSLs in statically-typed languages is the gen- 
eration of type errors which are lengthy and confusing for the end-user. In Ypnos, if a stencil function is 
applied to a grid which does not have sufficient boundary regions for safe indexing then a type error is 
produced. Unfortunately this type error is long, confusing, and intimidating. 

A potential remedy for such problems could be to allow error messages from a compiler to be passed 
to a pre-processor function before they are shown to a user. For example, GHC could be extended 
to allow a function to be specified which, in the event of a type-error, is passed an AST of the error 
message, returning a more helpful message to the end-user. 

7 Conclusion and Reflection 

In our work on Ypnos our slogan has been: to develop languages to improve the four "Rs" of programs: 
reading, (w)riting, reasoning, and running. That is, to improve the readability of programs in the lan- 
guage, to improve the ease with which a program can be expressed in the language, to increase the ability 
to reason about programs, and to provide an efficient implementation. In a general purpose language it is 
harder to excel in all four areas at once. Well-designed DSLs however can simultaneously provide prob- 
lem specific expressivity, improve programmer productivity, provide stronger reasoning, and provide 
more appropriate optimisation than general-purpose languages [24]. 

In Ypnos, problem specific expressivity is provided by grid patterns and boundary definitions. Gen- 
eral array indexing constructions, as found in most languages, have low information-content with re- 
spect to program properties. Program properties relating to indexing must often be inferred via analysis, 
which may be undecidable in general. Grid patterns and boundary definitions provide a restricted but 
information-rich method of indexing. We believe that Ypnos's syntax eases reading and writing of stencil 
computations, although this is difficult to measure objectively. It is however easier to quantify the effect 
on reasoning and running. This paper has shown that grid patterns and boundary regions provide enough 
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information to statically prove safety of Ypnos programs, which also permits optimisation via bounds- 
check elimination. More complex optimisations, such as tiling, skewing, and cache sensitive data-layout, 
could be performed since grid patterns provide decidable static information about array access. 

Other embedded DSLs have used the type system of the host language to enforce invariants of the 
DSL (of which a few were mentioned in Section[5]). In our approach GADTs were used to lift data-level 
information to the type-level; type families were used to manipulate and compose such information; and 
type classes were used to enforce language properties/invariants based on this information. 

One lesson learnt was in leveraging the semantics of type classes. A first attempt at encoding safety 
for Ypnos in Haskell used a special stencil data-type for grid patterns that included all relative index 
information in its type. Safety was then enforced by unification between this stencil type and a boundary 
type. However, since there is no particular order in which boundary definitions should be defined, a type- 
level normalisation algorithm, essentially a kind of sorting, was required for unification of boundary and 
stencil types. This approach was extremely complicated and error-prone. Type class constraints proved 
to be much more useful in the end, as they comprise a conjunction of constraints that is implicitly asso- 
ciative and commutative, thus ordering is not relevant. Safety in Ypnos is fortunately easily expressed 
via a conjunction of individual safety constraints (as shown in Appendix [C]). Encoding disjunctions of 
invariants would be more difficult and is not something we have experimented with or required yet. 

Another lesson learnt is that a working type-level encoding of a language invariant does not imply 
soundness of the encoding. For a time we had a type-level encoding of safety which appeared correct; 
it was simple and elegant, rejected the programs we thought it should reject, and accepted the programs 
we thought it should accept. Unfortunately it was unsound, accepting some programs that should have 
been rejected. Overconfidence in the infrastructure meant it was a while before this unsoundness was 
discovered. Writing a proof of soundness beforehand is preferable, and may even give an indication of 
how to implement a type-level encoding, as does the soundness proof-sketch in Appendix [C| 

There are certainly many more areas of programming that could benefit from DSLs. Structured grids, 
or stencil computations, as discussed in this paper, are amongst a number of programming patterns that 
have been identified as key motifs in parallel programming [1]. We have already begun considering 
extensions to Ypnos to support unstructured grids, also called meshes. We would be interested to see if 
other restricted syntactic forms such as grid patterns could be invented for other programming patterns. 
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A Supporting Experiments 

Each experiment is set up and measured as the experiments in Section |4] 

Comparing Safe & Unsafe Indexing A 2D-Laplace operation was applied toa512x512 image using 
both safe and unsafe memory access to test the effect on execution performance. The results were: 





Safe indexing 


Unsafe indexing 


1 iteration (s) 


3.982 


3.952 


101 iterations (s) 


6.328 


5.865 


Mean time per iteration (s) 


0.0234 


0.0191 



The overhead of safe indexing is approximately g^^jf^y = 1.23 i.e. approximately 20 — 25%. 

Comparing Tuple Indices & Inductively-defined Indices A 2D-Laplace operation was applied to a 
512x512 image using an array indexed by a pair of /nfs and an array indexed by an inductively defined 
Int list structure of length two. 





Tuples 


Lists 


1 iteration (s) 


3.982 


4.133 


101 iterations (s) 


6.328 


13.941 


Mean time per iteration (s) 


0.0234 


0.0981 



0.0981 
0.0234 



■ 4. 19, therefore the overhead of the list index is considerable at approximately 300%. 



B Haskell and GHC Type-System Features 

Type Classes Type classes in Haskell provide a mechanism for overloading, also known as ad-hoc 
polymorphism or type-indexed functions fV]. Consider the equality operation (==). Many types have 
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a well-defined notion of equality, thus an overloaded (==) operator is useful, where the argument type 
determines the actual equality operation used. The equality type class is defined: 

class Eq a where 

(==) :: a — )• a — )• Bool 

A type is declared a member of a class by an instance definition, which provides definitions of the class 
functions for a particular type. For example: 

instance Eq Int where 

x==y = eqint xy 

where eqInt is the built-in equality operation for Int. By this instance definition, Int is a member of Eq. 

Usage of (==) on polymorphically-typed arguments imposes a type-class constraint on the poly- 
morphic type of the arguments. Type-class constraints enforce that a type is an instance of a class and 
are written on the left-hand side of a type, preceding an arrow =^. For example, the following function: 

f xy = x==y 

has type/ "Eqa^a^ a^ Bool, where Eq a is the type-class constraint that a be a member of Eq. 

GADTs Generalised algebraic data types, or GADTs, |fT9l [TSll have become a powerful technique 
for dependent-type like programming in Haskell. Consider the following algebraic data type in Haskell 
which encodes a simple term calculus of products and projections over some type: 

data Term a = Pair (Term a) (Term a) \ Est (Term a) \ Snd (Term a) \ Val a 

The ADT is polymorphic in the type a and is recursive. For each tag in a the definition of an ADT (e.g. 
Pair, Est, etc.) a constructor is generated; e.g. for the first two data constructors of Term: 

Pair:: Term a — t- Term a — ?■ Term a 
Est : : Term a — )■ Term a 

Thus, every constructor returns a value of type Term a. We can define another ADT of values Val and 
write an evaluator from Term to Val: 

data Val a = ValPair (Val a) (Val a) \ ValConst a 

eval:: Term a — )• Val a 

eval (Pair xy) = ValPair (eval x) (eval y) 

eval (Est x) = case (eval x) of ValPair ab ^ a; _ — )• error "whoops " 
eval (Snd x) = case (eval x) of ValPair a b ^ b; _—)• error "whoops" 
eval (Valx) = ValConst x 

Unfortunately, the Term ADT allows non-sensical terms in the Term calculus to be constructed, such as 
Est (Val 1), therefore error handling cases must be included for eval on Est and Snd. 

The crucial difference between ADTs and GADTs is that GADTs specify the full type of a data 
constructor, allowing the result type of the constructor to have arbitrary type-parameters for the GADT's 
type constructor. For example. Term can be rewritten as the following on the left-hand side: 
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data Term t where 



Pair 
Fst 
Snd 
Val 



: Term a — t- Term b — t- Term 
: Term {a,b) — )■ Term a 
: Term {a,b) — )• Term b 
: a —7- Term a 



{a,b) 



eval:: Term a ^ a 
eval {Pairxy) = {eval eval y) 
eval {Fst x) = fst {eval x) 
eval {Snd x) = snd {eval x) 
eval {Val x) =x 



The type parameter of Term encodes the type of a term as a Haskell type. Because a term type is 
known, eval can be rewritten as on the right-hand side, where a well-typed Haskell value is returned as 
opposed to the Val datatype encoding Term values used previously. Furthermore, eval cannot be applied 
to malformed Term terms, thus error handling cases are unnecessary. GADTs thus allow a form of 
lightweight dependent-typing by permitting types to depend on data constructors. 



Type Families Type families in Haskell, also called type-indexed families of types, are simple type-level 
functions providing a limited form of computation at the type-level, evaluated during type checking Bll . 
Type families describe a number of rewrite rules from types to types, consisting of a head declaration 
specifying the name and arity of the type family and a number of instance declarations defining the 
rewrite rules. For example, the following type family provides a projection function on pair types: 

type family Fst t 

type instance Fst {a,b) = a 

Type families are open allowing further instances to be defined throughout a program or in another 
module. Instances of a family are therefore unordered thus a application of a family to an argument does 
not result in ordered pattern matching of the argument against the family's instances. Consequently, to 
preserve uniqueness of typing, type family instances must not overlap or at least must be confluent i.e. if 
there are two possible rewrites for a type family then the rewrites are equivalent. 

Type families may be recursive, as long as the size of the type parameters of the recursive call are less 
than the size of the type parameters in the instance making the recursive call. For example, the following 



defines an append operation on the type-level list representation shown in Section 3.3 



type family Append x y 

type instance Append Nil z = z 

type instance Append {Cons xy) Z'- 



Cons X {Append y z) 



C Soundness Proof 

The safe-indexing invariant of Ypnos is sound if well-typed programs cannot index undefined elements, 
that is, any out-of-bounds access always has a defined value. We provide a (semi-formal) proof of 
soundness here for two-dimensional grids, although the proof generalises easily to arbitrary dimensions. 

Consider a grid of two-dimensions with finite extent (0,0) to {N,M) (exclusive) and a relative index 
{i,j) accessed by some stencil function. We assume for this proof that / and j are both positive relative 
indices without loss of generality as all definitions are symmetric in the sign of relative indices. 

Let V be a predicate on index ranges which denotes indices which have a defined value. As an axiom, 
all indices inside the grid's extent have a defined value, thus: 



(axiom) V[0,N][0,M] 



(1) 
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Let 5 be a predicate of the safe relative indices for a grid where S{x,y) means {x,y) is safe. For the 
positive relative index (/, j), S and V are related thus: 

V[0,N + i][0,M + j]^Sii,j) (2) 

Axiom ([T]l is therefore equivalent to 5(0,0) i.e. zero-relative indices are always safe. 

The range of V can be separated into a conjunction of subranges, thus we can express ([T]l as: 

y[0,A^][0,M] 
A V[0,N][M,M + j-\] 

A V[N,N + i-\][0,M] ^ ''^^ 
A V[N,N + i][M,M + j] 

(Note: by ([T]l we could eliminate the first conjunct) Since logical conjunction is involutive (i.e. A A A =A) 
we can overlap the regions of V , thus we could rewrite Q as: 

y[0,A^ + /-l][0,M + j] 
A V[Q,N + i][Q,M + j-\] ^ S{iJ) (4) 

A V[N + i,N + i][M + j,M + i] 

Now consider another predicate of defined boundary regions where B{x,y) means that the boundary 
region {x,y) is defined. The boundary region predicate B is related to V in the following way: 

B{+x,+y) ^V[N + x,N + x][M + y,M + y] B{+x,*) ^V[N + x,N + x\[Q,M] 

B{^x-y)^V[N + x,N + x][-y,-y] B{*^,+y) ^V[0,N][M + y,M + y] ... 

B{-x,+y)^V[-x,-x][M+y,M+y] B{-x,*v) ^V[-x,-x][0,M] 

B{-x,-y)^V[-x,-x][-y,-y] B{*v,-y) ^V[0,N][-y,-y] 

Boundary regions thus map to a single defined index e.g. V[N + x,N + x][N + y,N + y], or to a. range 
over the whole inner extent in one dimension with the other dimension fixed at a single index outside of 
the extent e.g. V[N + x,N + x][0,M] orV[0,N][-y,-y]. 

By Q we can replace V[N + i,N + i] [M + j,M + j] in Q withB(/,7), thus: 

V[0,N + i-\][0,M + j] 
A y[0,A^ + /][0,M + j-l] ^ SiiJ) (6) 
A BiiJ) 

We can eliminate V from ([6]) entirely by the relation of V to 5 Q: 

Sii-\,j) A 5(/,j-l) A B{i,j) ^ SiiJ) (7) 

Equation ^ thus specifies a kind of recurrence relation on S, which will eventually reach the axiomatic 
base case 5(0,0). The definition of Safe is exactly the predicate S as defined by the recurrence (|7]) where 
InBoundary is the predicate B: 

instance {Safe {IntT (Pred n),IntT n') b,Safe [IntT n,IntT {Pred n')) b, 
InBoundary (IntT n,IntT n') b) =^ Safe {IntT n,IntT n') b 

The axiom ([T]), 5(0,0) is satisfied by the case that all zero-relative indices are safe: 

instance Safe {IntT {Pos Z),IntT {Pos Z)) b 

By induction Pred reduces relative indices towards zero, acting as the minus operation in (|7]), and is 
symmetrical in its treatment of negative and positive relative indices. 
Safe thus provides a sound encoding of safe-indexing for grids. □ 



